Intro to Data Science
Anton Kalén
University of Skövde
Sep 16, 2022
Mathematics is the logic of certainty;
probability is the logic of uncertainty.
Constants
\(\pi = 3.1415927\)
Constants
\(\pi = 3.1415927\)
Deterministic variables:
\(VAT = 0.25 \times price\)
Deterministic variables
\(VAT = 0.25 \times price\)
Will this person buy or not?
Number of visitors tomorrow
Height of the next person coming
Total weight of the next transport
You work in a online retailer and are tasked with predicting if a visitor will buy the product sold or not.
In groups of 2–4 persons, discuss what sources of uncertainty exist that makes it hard to be sure about if the visitor buys or not. 5 min.
The retailer has 3 visitors per day. How many of them will buy a product?
Sample space
\(\Omega = \{\text{nnn}, \text{bnn}, \text{nbn}, \text{nnb}, \text{bbn}, \text{bnb}, \text{nbb}, \text{bbb} \}\)
\(\text{n} = \text{no buy}\)
\(\text{b} = \text{buy}\)
Random Variable
Let \(X\) be the number of buys in a day.
\(X : \Omega \to \mathbb{R}\)
Random Variable
\(X(\text{nnn}) = 0\)
\(X(\text{bnn}) = 1\)
\(X(\text{nbn}) = 1\)
\(X(\text{nnb}) = 1\)
\(X(\text{bbn}) = 2\)
\(X(\text{bnb}) = 2\)
\(X(\text{nbb}) = 2\)
\(X(\text{bbb}) = 3\)
Probability function
\(P(X)\)
\(P : X \to [0, 1]\)
Probability function
\(P(X = 0) = \frac{1}{8} = 0.125\)
\(P(X = 1) = \frac{3}{8} = 0.375\)
\(P(X = 2) = \frac{3}{8} = 0.375\)
\(P(X = 3) = \frac{1}{8} = 0.125\)
If probability of each visitor buying = 0.5
\(X\) — Random variable (r.v.)
\(\Omega_X = \{x_1, \ldots, x_n \}\) — Sample space of r.v. \(X\)
\(P(X)\) — Probability function on r.v. \(X\)
\(P(X = x_1)\) or \(P(x_1)\) — Probability of \(X\) taking value \(x_1\)
We can represent random variables using probability distributions.
Distribution of r.v.
\(X \sim \text{Bin}(n, p)\)
\(n = \text{number of trials}\).
\(p = \text{probability of success}\).
The retailer has 3 visitors per day. The probability of each visitor buying a product is 0.5. How many of them will buy a product?
\(n = 3\)
\(p = 0.5\)
\(X \sim \text{Bin}(3, 0.5)\)
The retailer has 3 visitors per day. The probability of each visitor buying a product is 0.5. How many of them will buy a product?
The retailer has 10 visitors per day. The probability of each visitor buying a product is 0.5. How many of them will buy a product?
The retailer has 10 visitors per day. The probability of each visitor buying a product is 0.3. How many of them will buy a product?
Probability mass function (PMS)
\(p_X(x_i) = P(X = x_i)\)
Probability mass function (PMS)
\(p_X(x_i) = P(X = x_i)\)
Cumulative distribution function (CDF)
\(F_X(x_i) = P(X \leq x_i)\)
Discrete duistribution for counting success in a specific time, where there is a large number of events and low probability.
For example, the online retailer has thousands of visitors each day and only a small proportion ends up buying.
Poisson distribution
\(X \sim \text{Pois}(\lambda)\)
\(\Omega_X = \mathbb{I}^+\) (Possible values are positive integers)
\(\lambda\) — Mean and variance of \(X\)
The average amount of daily sales are 4.
The average amount of daily sales are 10.
The average amount of daily sales are 4.
The average amount of daily sales are 4.
What is the probability that a visitor buys a product, given that it is a raining day?
| Buy | No Buy | |
|---|---|---|
| Sun | 13 | 39 |
| Rain | 26 | 104 |
Random variables & sample space
Finish visit: \(X\), \(\Omega_X = \{\text{Buy}, \text{No Buy}\}\)
Wheather: \(Y\), \(\Omega_Y = \{\text{Sun}, \text{Rain}\}\)
What is the probability that a visitor buys a product, given that it is a raining day?
| Buy | No Buy | |
|---|---|---|
| Sun | 13 | 39 |
| Rain | 26 | 104 |
Marginal probability
Marginal probabilities \(X\): \(P(X = \text{Buy})\), \(P(X = \text{No buy})\)
Marginal probabilities of \(Y\): \(P(Y = \text{Sun})\), \(P(Y = \text{Rain})\)
What is the probability that a visitor buys a product, given that it is a raining day?
| Buy | No Buy | |
|---|---|---|
| Sun | 13 | 39 |
| Rain | 26 | 104 |
Conditional probability \(P(X|Y)\)
Probability of \(X\) given Sun: \(P(X|Y = \text{Sun})\), or \(P(X|\text{Sun})\)
Probability of Buy, given Rain: \(P(X = \text{Buy}| Y = \text{Rain})\), or \(P(\text{Buy}|\text{Rain})\)
What is the probability that a visitor buys a product, given that it is a raining day?
| Buy | No Buy | |
|---|---|---|
| Sun | 13 | 39 |
| Rain | 26 | 104 |
Joint probability \(P(X, Y)\)
Probability of Buy when it rains: \(P(X = \text{Buy}, Y = Rain)\), or \(P(X = \text{Buy}, Y = Rain)\)
Marginal probability \(P(X = \text{Buy})\)
Marginal probability \(P(Y = \text{Sun})\)
Conditional probability \(P(X = \text{Buy}|Y = \text{Rain})\)
Conditional probability \(P(X = \text{Buy}|Y = \text{Sun})\)
Joint probability \(P(X = \text{Buy}, Y = \text{Rain})\)
Joint probability \(P(X, Y)\)
Random variables that can take any real value in an interval
\(X \sim \mathcal{N}(\mu, \sigma)\)
\(mu\) — Mean of \(X\)
\(\sigma\) — Standard deviation of \(X\)
\(\Omega = \mathbb{R} = (-\infty, \infty)\)
Probability density function
\(\displaystyle f(X)={\frac {1}{\sigma {\sqrt {2\pi }}}}e^{-{\frac {1}{2}}\left({\frac {x-\mu }{\sigma }}\right)^{2}}\)
\(\displaystyle \int_{-\infty}^{\infty} f(X) dX = 1\)
Cumulative distribution function (CDF)
\(\displaystyle F(X) = \int_{-\infty}^{X} f(X) dX\)
Probability density function of \(\mathcal{N}(0, 1)\).
Probability density function of \(\mathcal{N}(0, 2)\).
Probability density function of \(\mathcal{N}(3, 2)\).
Cumulative distribution function of \(\mathcal{N}(0, 1)\).
Cumulative distribution function of \(\mathcal{N}(0, 2)\).
Cumulative distribution function of \(\mathcal{N}(3, 2)\).
Given \(X \sim \mathcal{N}(1, 0)\), the \(P(X \leq 1) = F_X(1)\).
Given \(X \sim \mathcal{N}(1, 0)\), calculate the probability that \(x \in [-1, 1]\).
Simulate 1000 draws from \(X \sim \mathcal{N}(1, 0)\)
Similar structure to Lab 1: R programming
You need to hand in correct solution with working code for all excercises.
Work individual:
You can only use base R, no external libraries.
No copy paste from internet or other sources.
?distributions, ?plot …